Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Serverless functions can be spun up in milliseconds and scaled out quickly, forming an ideal platform for quick, interactive parallel queries over large data sets. Modern databases use code generation to produce efficient physical plans, but compiling such a plan on each serverless function is costly: every millisecond spent executing on serverless functions multiplies in cost by the number of functions running. Existing serverless data science frameworks therefore generate and compile code on the client, which precludes specializing this code to patterns that may exist in the input data of individual serverless functions. This paper argues for exploring a trade-off space between one-off code generation on the client, and hyperspecialized compilation that generates bespoke code on each serverless function. Our preliminary experiments show that hyperspecialization outperforms client-based compilation on typical heterogeneous datasets in both cost and performance by 2–4×.more » « less
-
Data privacy laws like the EU’s GDPR grant users new rights, such as the right to request access to and deletion of their data. Manual compliance with these requests is error-prone and imposes costly burdens especially on smaller organizations, as non-compliance risks steep fines. K9db is a new, MySQL-compatible database that complies with privacy laws by construction. The key idea is to make the data ownership and sharing semantics explicit in the storage system. This requires K9db to capture and enforce applications’ complex data ownership and sharing semantics, but in exchange simplifies privacy compliance. Using a small set of schema annotations, K9db infers storage organization, generates procedures for data retrieval and deletion, and reports compliance errors if an application risks violating the GDPR. Our K9db prototype successfully expresses the data sharing semantics of real web applications, and guides developers to getting privacy compliance right. K9db also matches or exceeds the performance of existing storage systems, at the cost of a modest increase in state size.more » « less
-
null (Ed.)When network products and services become more valuable as their userbase grows (network effects), this tendency can become a major determinant of how they compete with each other in the market and how the market is structured. Network effects are traditionally linked to high market concentration, early-mover advantages, and entry barriers, and in the market they have also been used as a valuation tool. The recent resurgence of Bitcoin has been partly attributed to network effects, too. We study the existence of network effects in six cryptocurrencies from their inception to obtain a high-level overview of the application of network effects in the cryptocurrency market. We show that, contrary to the usual implications of network effects, they do not serve to concentrate the cryptocurrency market, nor do they accord any one cryptocurrency a definitive competitive advantage, nor are they consistent enough to be reliable valuation tools. Therefore, while network effects do occur in cryptocurrency networks, they are not (yet) a defining feature of the cryptocurrency market as a whole.more » « less
-
null (Ed.)Today's data science pipelines often rely on user-defined functions (UDFs) written in Python. But interpreted Python code is slow, and Python UDFs cannot be compiled to machine code easily. We present Tuplex, a new data analytics framework that just in-time compiles developers' natural Python UDFs into efficient, end-to-end optimized native code. Tuplex introduces a novel dual-mode execution model that compiles an optimized fast path for the common case, and falls back on slower exception code paths for data that fail to match the fast path's assumptions. Dual-mode execution is crucial to making end-to-end optimizing compilation tractable: by focusing on the common case, Tuplex keeps the code simple enough to apply aggressive optimizations. Thanks to dual-mode execution, Tuplex pipelines always complete even if exceptions occur, and Tuplex's post-facto exception handling simplifies debugging. We evaluate Tuplex with data science pipelines over real-world datasets. Compared to Spark and Dask, Tuplex improves end-to-end pipeline runtime by 5-91x and comes within 1.1-1.7x of a hand-optimized C++ baseline. Tuplex outperforms other Python compilers by 6x and competes with prior, more limited query compilers. Optimizations enabled by dual-mode processing improve runtime by up to 3x, and Tuplex performs well in a distributed setting on serverless functions.more » « less
-
Recently, there have been several proposals to develop visual recommendation systems. The most advanced systems aim to recommend visualizations, which help users to find new correlations or identify an interesting deviation based on the current context of the user's analysis. However, when recommending a visualization to a user, there is an inherent risk to visualize random fluctuations rather than solely true patterns: a problem largely ignored by current techniques. In this paper, we present VizCertify, a novel framework to improve the performance of visual recommendation systems by quantifying the statistical significance of recommended visualizations. The proposed methodology allows to control the probability of misleading visual recommendations using both classical statistical testing procedures and a novel application of the Vapnik Chervonenkis (VC) dimension towards visualization recommendation which results in an effective criterion to decide whether a recommendation corresponds to a true phenomenon or not.more » « less
An official website of the United States government

Full Text Available